-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add DeltaTriplesManager
#1603
Add DeltaTriplesManager
#1603
Conversation
Signed-off-by: Johannes Kalmbach <[email protected]>
TODO: Write comments etc. Signed-off-by: Johannes Kalmbach <[email protected]>
Signed-off-by: Johannes Kalmbach <[email protected]>
Signed-off-by: Johannes Kalmbach <[email protected]>
Signed-off-by: Johannes Kalmbach <[email protected]>
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #1603 +/- ##
==========================================
- Coverage 89.17% 89.16% -0.01%
==========================================
Files 372 372
Lines 34579 34635 +56
Branches 3912 3912
==========================================
+ Hits 30835 30883 +48
- Misses 2471 2479 +8
Partials 1273 1273 ☔ View full report in Codecov by Sentry. |
Some SPARQL operations may result in Quad insertions and deletions. Is it possible that operations acquire the lock in between these two parts of a single update? |
Signed-off-by: Johannes Kalmbach <[email protected]>
Signed-off-by: Johannes Kalmbach <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1-1 with Johannes, round 1
We discussed some changes.
This is awesome!
Signed-off-by: Johannes Kalmbach <[email protected]>
# Conflicts: # test/DeltaTriplesTest.cpp
Signed-off-by: Johannes Kalmbach <[email protected]>
DeltaTripelsManager
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great now. I have committed a few more minor changes.
I have two questions, which I marked as TODO(Hannah)
, please explain.
Signed-off-by: Johannes Kalmbach <[email protected]>
Conformance check passed ✅No test result changes. |
Quality Gate passedIssues Measures |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the answers and the extended comments, this looks great now.
One more question: CodeCov complains and some of the untested lines look testable. What do you think, should we still address (some of) them in this PR?
@hannahbast I have also looked at the lines, but they are either completely unrelated to this PR, or they will be adressed anyway in one of the follow-up PRs. |
Since #1355, the query `LIMIT` is clamped to the value of the QLever-specific `send` parameter. In particular, this broke the display of the result size in the QLever UI, which sets `send=100` when showing the first page of results for a query. This change restores the old behavior for the `send` parameter, yet makes good use of the new lazy query processing. Specifically, lazily computed results blocks are now processed as follows: (1) the first result blocks before OFFSET are computed but skipped; (2) then results are computed and materialized until the value of the `send` parameter is reached; (3) then results are computed and counted but *not* materialized until the LIMIT is reached; (3) all remaining blocks are not even computed. The QLever JSON now has two new fields `resultSizeTotal` and `resultSizeExported`, with the corresponding values. For the sake of backwards-compatibility, the old `resultsize` field is still there and has the same value as the `resultSizeTotal` field. For the same reason, the `send` parameter keeps its name for now, but should be renamed to `exportLimit` eventually. On the side fixed, dropped the hard limit of `MAX_NOF_ROWS_IN_RESULT = 1'000'000` for JSON results. Also fix the compilation error introduced by the interplay of the merge of #1603 and #1607 . Fixes #1605 and #1455 .
The already existing `DeltaTriples` class maintains a dynamically changing set of insertions and deletions relative to the original input data, together with a (single) local vocab. The class is not threadsafe and has to be used with care. In particular, concurrent update queries have to be serialized, and while a query makes use of the "delta triples", it has to be made sure that they are not changed over the course of the processing of that query. Both of these problems are solved by the new `DeltaTriplesManager` class. The index has a single object of this class. It maintains a single `DeltaTriples` object, write access to which is strictly serialized. Each new query gets a so-called *snapshot* of the current delta triples. This is a full copy (of the delta triples located in each of the permutations and of the local vocab). These snapshots are read-only and multiple queries can share the same snapshot. A snapshot lives as long as one query using it is still being processed.
Since ad-freiburg#1355, the query `LIMIT` is clamped to the value of the QLever-specific `send` parameter. In particular, this broke the display of the result size in the QLever UI, which sets `send=100` when showing the first page of results for a query. This change restores the old behavior for the `send` parameter, yet makes good use of the new lazy query processing. Specifically, lazily computed results blocks are now processed as follows: (1) the first result blocks before OFFSET are computed but skipped; (2) then results are computed and materialized until the value of the `send` parameter is reached; (3) then results are computed and counted but *not* materialized until the LIMIT is reached; (3) all remaining blocks are not even computed. The QLever JSON now has two new fields `resultSizeTotal` and `resultSizeExported`, with the corresponding values. For the sake of backwards-compatibility, the old `resultsize` field is still there and has the same value as the `resultSizeTotal` field. For the same reason, the `send` parameter keeps its name for now, but should be renamed to `exportLimit` eventually. On the side fixed, dropped the hard limit of `MAX_NOF_ROWS_IN_RESULT = 1'000'000` for JSON results. Also fix the compilation error introduced by the interplay of the merge of ad-freiburg#1603 and ad-freiburg#1607 . Fixes ad-freiburg#1605 and ad-freiburg#1455 .
PR #1582 and #1603 gave all index-lookup methods access to a snapshot of the (located) delta triples. With this change, these triples are now merged with the original triples during query processing whenever necessary. When an index block does not contain any located triples, the performance for accessing that block is the same as before. The methods for obtaining the result size of an index scan now have two versions: one for obtaining an approximate size (this is cheap because it can be computed from the metadata of the blocks and the located triples) and one for obtaining the exact size (if there are located triples this is expensive because it requires reading and decompressing a block and merging the located triples).
PR ad-freiburg#1582 and ad-freiburg#1603 gave all index-lookup methods access to a snapshot of the (located) delta triples. With this change, these triples are now merged with the original triples during query processing whenever necessary. When an index block does not contain any located triples, the performance for accessing that block is the same as before. The methods for obtaining the result size of an index scan now have two versions: one for obtaining an approximate size (this is cheap because it can be computed from the metadata of the blocks and the located triples) and one for obtaining the exact size (if there are located triples this is expensive because it requires reading and decompressing a block and merging the located triples).
Add the missing code in `Server.cpp` and `Server.h` to process a parsed update request using the `DeltaTriplesManager` from #1603 . As of this commit, QLever has an initial beta support for a subset of SPARQL UPDATE with the following limitations (all of which will be added in the future, and the list is far from being complete): 1. Updates are not yet persistent, so a restart or crash of the QLever server will delete all updates. 2. Only a single update request per query is allowed, not the syntax that atomically chains an arbitrary sequence of updates. 3. Only INSERT and DELETE queries are supported, the support for queries that drop or add a complete graph will be added in the future.
The already existing
DeltaTriples
class maintains a dynamically changing set of insertions and deletions relative to the original input data, together with a (single) local vocab. The class is not threadsafe and has to be used with care. In particular, concurrent update queries have to be serialized, and while a query makes use of the "delta triples", it has to be made sure that they are not changed over the course of the processing of that query.Both of these problems are solved by the new
DeltaTriplesManager
class. The index has a single object of this class. It maintains a singleDeltaTriples
object, write access to which is strictly serialized. Each new query gets a so-called snapshot of the current delta triples. This is a full copy (of the delta triples located in each of the permutations and of the local vocab). These snapshots are read-only and multiple queries can share the same snapshot. A snapshot lives as long as one query using it is still being processed.